Searching PAJ http://www1 9Jpdl.jpo.go.jp/PA1 /result/detail/main/wAAA3MaiblDA41 1 095793P1 .htm 



PATENT ABSTRACTS OF JAPAN 

(1 l)Publication number : 1 1-095793 

(43)Date of publication of application : 09.04.1999 



(51)Int.CI. 



G10L 3/00 

G10L 3/00 

G10L 3/00 

G10L 5/04 

G10L 9/00 



GO 



(21) Application number : 09-252446 

(22) Date of filing: 17.09.1997 



(71) Applicant : 

(72) Inventor : 



TOSHIBA CORP 

YANO TAKEHIDE 
CHINO TETSURO 
KONO YASUYUKI 



o 

3 



(54) VOICE INPUT INTERPRETING DEVICE AND VOICE INPUT INTERPRETING METHOD 

(57)Abstract: 

PROBLEM TO BE SOLVED: To obtain a device capable of interpreting an input 
voice so that an application part can operate properly even when a user does not 
remember a word or a sentence, which is to be uttered correctly, by detecting a part in 
which one part of normal vocabularies is replacedly expressed from the input voice 
and replacing the detected part with the normal expression corresponding to the part. 
SOLUTION: This device detects a part in which one part of normal vocabularies is 
replacedly expressed from the input voice to replace this part with a normal 
expression corresponding to this part. In this device, a vocabulary storage 102 is 
connected to a voice analyzing part 101 and stores information as to vocabularies in 
which one parts of normal vocabularies are replaced with wild card expressions being 
expressions to be replaced with arbitrary plural words such as, for example, 'some', 
'ra, ra, ra\ In this device, even through the user does not remember, for example, the 
name named TOKYO stay-in hotel' correctly and when the user performs a voice 
input as TOKYO ra, ra, ra hotel' by using a wild card expression, information can be 
outputted to the application part by interpreting the name of the voice input to a 
proper name. 
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* NOTICES * 



Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original precisely, 
2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] Voice input interpretation equipment which outputs the information on the vocabulary which interprets input voice 
and corresponds characterized by providing the following A means to memorize the 1st information about a regular 
vocabulary, and the 2nd information about the vocabulary of this norm in consideration of voice input of a part of vocabulary 
of this norm being replaced and carried out to an alternative expression defined beforehand The means which carries out 
voice / input ] speech recognition A means to detect the aforementioned alternative expression from the aforementioned 
speech recognition result based on the 2nd information of the above A means to search the 1st information of the above and to 
ask for the corresponding vocabulary based on the portion of vocabularies other than this alternative expression included in 
the recognition result of the aforementioned input voice at least when the aforementioned alternative expression is detected by 
this means from the aforementioned recognition result 

[Claim 2] Voice input interpretation equipment according to claim 1 characterized by having further a means to evaluate the 
priority of the vocabulary which corresponds based on the phonological feature of the voice corresponding to the 
aforementioned alternative expression at least when two or more aforementioned vocabularies which carry out relevance are 
searched. 

[Claim 3] Voice input interpretation equipment which outputs the information on the vocabulary which interprets input voice 
and corresponds characterized by providing the following A lexical storage means to memorize an alternative expression 
which substituted for a part of regular vocabulary which serves as a speech recognition object by alternative expression used 
as an alternative of arbitrary language, and which was defined beforehand as a kind of a vocabulary A rhythm 
information-storage means to memorize the notation and rhythm information of a vocabulary on the aforementioned norm 
which do not include the aforementioned alternative expression among the vocabularies memorized by the aforementioned 
lexical storage means A voice-analysis means to perform speech recognition and analysis of audio rhythm with reference to 
the aforementioned lexical storage means to the voice inputted through the audio input unit A permutation representation 
collating means to replace the portion of the aforementioned alternative expression in the portion of the vocabulary of the 
aforementioned norm with reference to the aforementioned rhythm information-storage means based on the result of the 
analysis about the result of the aforementioned speech recognition to the voice by the aforementioned voice-analysis means by 
which the input was carried out [ aforementioned ], and the aforementioned rhythm 

[Claim 4] Voice input interpretation equipment equipped with a means to output the voice-analysis result which is 
characterized by providing the following and which analyzes and carries out speech recognition of the voice inputted from the 
audio input unit, and includes a speech recognition result, and a lexical storage means to memorize the vocabulary which 
serves as a candidate for recognition in case this speech recognition is performed An alternative expression storage means to 
memorize the alternative expression used as an alternative of arbitrary language An alternative expression detection means to 
detect the same expression as the vocabulary memorized by the aforementioned alternative expression storage means from the 
inputted speech information A permutation representation storage means to memorize what divided further the vocabulary 
memorized by the aforementioned lexical storage means, and was made into another word A processing means search a 
vocabulary appropriate as language by which alternative expression was carried out from the vocabulary which performs 
considering the vocabulary memorized by the aforementioned permutation-representation storage means in the speech 
recognition of the portion which is not this alternative expression in the input speech information by which the 
aforementioned alternative expression was detected by the aforementioned alternative expression detection means as a 
speech-recognition object, and is memorized by the aforementioned permutation-representation storage means using this 
speech-recognition result 

[Claim 5] The aforementioned processing means by performing the aforementioned speech recognition per syllable or 
phoneme, and referring to the recognition result of this syllable or a phoneme unit The portion in which a part of vocabulary 
of the aforementioned norm was added and uttered as a part of aforementioned alternative expression is detected. Voice input 
interpretation equipment according to claim 4 characterized by choosing preferentially expression which suited the 
aforementioned detection result in case expression by which alternative expression was carried out from the vocabulary 
memorized by the aforementioned permutation representation storage means is searched. 

[Claim 6] It is voice input interpretation equipment according to claim 4 analyze the aforementioned alternative expression 
detection means about the rhythm of input voice, and carry out that the aforementioned processing means chooses 
preferentially the language suited or approximated to the conditions of the rhythm obtained as a result of the aforementioned 
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analysis when searching expression by which alternative expression was carried out from the vocabulary memorized by the 
aforementioned permutation representation storage means as the feature. 

[Claim 7] In the voice input interpretation method which outputs the information on the vocabulary which interprets input 
voice and corresponds Based on the information about the vocabulary of this norm in consideration of carrying out [ voice / 
input ] speech recognition, and voice input of a part of regular vocabulary defined beforehand being replaced and carried out 
to an alternative expression defined beforehand When the aforementioned alternative expression was detected from the 
aforementioned speech recognition result and the aforementioned alternative expression is detected from the aforementioned 
recognition result, The voice input interpretation method characterized by searching the information about the regular 
vocabulary beforehand defined based on the portion of vocabularies other than this alternative expression included in the 
recognition result of the aforementioned input voice at least, and asking for the corresponding vocabulary. 
[Claim 8] The voice input interpretation method according to claim 7 characterized by evaluating the priority of the 
vocabulary which corresponds based on the phonological feature of the voice corresponding to the aforementioned alternative 
expression at least when two or more aforementioned vocabularies which carry out relevance are searched. 
[Claim 9] In the voice input interpretation method which outputs the information on the vocabulary which interprets input 
voice and corresponds A lexical storage means to memorize an alternative expression which substituted for a part of regular 
vocabulary which serves as a speech recognition object by alternative expression used as an alternative of arbitrary language, 
and which was defined beforehand as a kind of a vocabulary is referred to to the voice inputted through the audio input unit. 
Perform speech recognition and analysis of audio rhythm, and it is based on the result of the analysis about the result of the 
aforementioned speech recognition to the voice by which the input was carried out [ aforementioned ], and the 
aforementioned rhythm. The voice input interpretation method characterized by replacing the portion of the aforementioned 
alternative expression in the portion of the vocabulary of the aforementioned norm with reference to an aforementioned 
rhythm information-storage means to memorize the notation and rhythm information of a vocabulary on the aforementioned 
norm which do not include the aforementioned alternative expression among the vocabularies memorized by the 
aforementioned lexical storage means. 

[Claim 10] In the voice input interpretation method which outputs the information on a vocabulary that it corresponds of the 
lexical storage meanses to memorize the vocabulary which serves as a candidate for recognition in case input voice is 
interpreted through speech recognition and this speech recognition is performed The same expression as the vocabulary 
memorized by alternative expression storage means to memorize an alternative expression which serves as an alternative of 
arbitrary language from the inputted speech information is detected. The speech recognition of the portion which is not this 
alternative expression in the input speech information by which the aforementioned alternative expression was detected The 
vocabulary memorized by permutation representation storage means to memorize what divided further the vocabulary 
memorized by the aforementioned lexical storage means, and made into another word is performed as a speech recognition 
object. The voice input interpretation method characterized by searching a vocabulary appropriate as language by which 
alternative expression was carried out from the vocabulary memorized by the aforementioned permutation representation 
storage means using this speech recognition result. 

[Claim 11] By performing the aforementioned speech recognition per syllable or phoneme, and referring to the recognition 
result of this syllable or a phoneme unit in searching the aforementioned vocabulary The portion in which a part of vocabulary 
of the aforementioned norm was added and uttered as a part of aforementioned alternative expression is detected. The voice 
input interpretation method according to claim 10 characterized by choosing preferentially expression which suited the 
aforementioned detection result in case expression by which alternative expression was carried out from the vocabulary 
memorized by the aforementioned permutation representation storage means is searched. 

[Claim 12] The voice input interpretation method according to claim 1 1 characterized by choosing preferentially the language 
suited or approximated to the conditions of the rhythm obtained as a result of analyzing about the rhythm of input voice, when 
searching expression by which alternative expression was carried out from the vocabulary memorized by the aforementioned 
permutation representation storage means. 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[The technical field to which invention belongs] this invention relates to the voice input interpretation equipment and the 

voice input interpretation method of interpreting input voice. 

[0002] 

[Description of the Prior Art] In recent years, in addition to the input by a conventional keyboard and a conventional mouse, 
in the computing system containing a personal computer, it is becoming possible to input speech information. 
[0003] Moreover, the demand of the voice dialog system which converses with a user by the voice input/output is increasing 
by progress of natural-language analysis, natural-language generation or speech recognition, speech-synthesis technology, or 
interactive-processing technology etc., and development of various voice dialog systems, such as "TOSBURG-II" (1428 No. 
the electronic-intelligence communication society paper magazine, Vo.U 77-D-II, 8, pp.1417- 1994) etc. which is the dialog 
system which can use by the voice input by free utterance, is made. 

[0004] Since the input method with the voice used for such a voice dialog system is the input method which does not require 
mastery like especially a keyboard and can be treated to anyone, the use to the woods meeting system which everyone uses is 
expected, and the demand to more advanced speech processing technique is increasing. 

[0005] The interpretation of voice input incorporates conventionally the voice input inputted through a microphone etc. from 
a user. For example, presume the candidate of a voice-analysis unit by signal strength etc., and the analysis using FFT (fast 
Fourier transform) of an analysis unit term etc. extracts the feature pattern etc. The standard pattern and extraction pattern 
which were prepared beforehand For example, the degree method of compound similar, Perform collating using the DP 
(dynamic programming) method or HMM (hidden Markov model), recognize inputted voice, and a speech recognition result 
is received. It is carried out the semantic content of the input from a user, and by extracting an utterance intention by 
performing syntax analysis, a semantic analysis, etc. 

[0006] When performing speech recognition in the voice input interpretation method in such voice dialog systems etc. 
conventionally, collating with the pattern of the word currently prepared beforehand or a text was performed. However, by 
this method, the user needed to memorize clearly the word or text (namely, the word or text which can interpret the system) 
which can speak, and had given the user the burden. 

[0007] Furthermore, when the user had memorized a part of word which can speak, or text, even if the user inputted the part 
memorized, it was regarded as different voice input from the pattern currently prepared beforehand, incorrect recognition 
arose, operation which was contrary to the intention of a user as a result was outputted in many cases, and the burden was 
given to the user. 

[0008] For example, if what has the task of guidance as an example of a social system is mentioned the information 
concerning the hotel when the information which the user knows is some "Tokyo hotels" of the "Tokyo stay in hotel" -- it is 
going to find out -- "Tokyo, even if it inputs hotel" somehow Since the pattern of the name of the hotel which exists really 
beforehand prepared into the system is a different thing, incorrect recognition will arise, the result that the information 
contrary to an intention of a user is shown will be brought, and what profits will also be made to a user. 
[0009] Moreover, when the user has memorized only the rhythm of the word (or it is expected that it is naturally registered 
into a system) which can speak, or a text In the conventional system, even if it inputted an another word or an another text 
which holds only the rhythm of the word or a text, since it was not able to receive as a formal input but incorrect recognition 
arose, operation which the user meant was not performed and the burden had been given to the user. 
[0010] For example, if what has the task of guidance like the above as an example of a social system is mentioned In case a 
certain user is going to acquire the information about "the mouth hotel of a round head", when the information which this user 
has is the rhythm of "the mouth hotel of a round head", and some "-- hotels", it is going to find out about the information 
about the hotel, the meaning of "being a hotel somehow" -- a "RARARARARA hotel", a "HONYARARARA hotel", or a 
tare, even if it utters suitably (or -- imitating) and inputs being conscious of the rhythm which RARARA hotel" etc. and "the 
mouth hotel of a round head" have Incorrect recognition will arise, the result that the information contrary to an intention of a 
user is shown will be brought, and what profits will also be made to a user. 

[001 1] As shown above, since he was able to understand only by the pattern of the word prepared beforehand or a text, the 

great burden had been given to the user by the conventional voice input interpretation method. 

[0012] 
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[Problem(s) to be Solved by the Invention] Thus, if the conventional voice input interpretation method was applied in the 
equipment accompanied by voice input, since the pattern of the word received as voice input or a text was limited to what is 
registered beforehand, the text which a user can utter needed to be memorized clearly and there was a problem that a user's 
burden increased. 

[0013] Moreover, when the user had memorized a part of word which can speak, or text, even if the user inputted the part 
memorized, it was regarded as different voice input from the pattern currently prepared beforehand, incorrect recognition 
arose, operation which was contrary to the intention of a user as a result was outputted in many cases, and there was a 
problem that a user's burden increased. 

[0014] Moreover, in the conventional system, when the user had memorized only the rhythm of the word which can speak, or 
a text, since it was not able to receive as a formal input but incorrect recognition arose, operation which the user meant was 
not performed and there was a problem that a user's burden increased. 

[0015] this invention was made in consideration of the above-mentioned event, and aims at offering the voice input 
interpretation equipment which can be interpreted as an application portion operating appropriately even if a user does not 
memorize correctly the word or text which can be uttered. 

[0016] Moreover, this invention presses down audio incorrect recognition, even when a part of word which a user can utter, or 
text is memorized, and it aims at offering the voice input interpretation equipment which can lead the output of a system with 
voice input to what met the intention of a user. 

[0017] Moreover, this invention presses down audio incorrect recognition, even when only the rhythm of the word which a 
user can utter, or a text is memorized, and it aims at offering the voice input interpretation equipment and the voice input 
interpretation method of leading the output of a system with voice input to what met the intention of a user. 
[0018] 

[Means for Solving the Problem] In the voice input interpretation equipment which outputs the information on the vocabulary 
which this invention (claim 1) interprets input voice, and corresponds A means to memorize the 1st information about a 
regular vocabulary, and the 2nd information about the vocabulary of this norm in consideration of voice input of a part of 
vocabulary of this norm being replaced and carried out to an alternative expression defined beforehand, The means which 
carries out [ voice / input ] speech recognition, and a means to detect the aforementioned alternative expression from the 
aforementioned speech recognition result based on the 2nd information of the above, When the aforementioned alternative 
expression is detected by this means from the aforementioned recognition result, it is characterized by having a means to 
search the 1st information of the above and to ask for the corresponding vocabulary, based on the portion of vocabularies 
other than this alternative expression included in the recognition result of the aforementioned input voice at least. 
[0019] When two or more aforementioned vocabularies which carry out relevance are searched preferably, you may make it 
have further a means to evaluate the priority of the vocabulary which corresponds based on the phonological feature of the 
voice corresponding to the aforementioned alternative expression at least. 

[0020] In the voice input interpretation equipment which outputs the information on the vocabulary which this invention 
(claim 3) interprets input voice, and corresponds A lexical storage means to memorize an alternative expression which 
substituted for a part of regular vocabulary which serves as a speech recognition object by alternative expression used as an 
alternative of arbitrary language, and which was defined beforehand as a kind of a vocabulary, A rhythm information-storage 
means to memorize the notation and rhythm information of a vocabulary on the aforementioned norm which do not include 
the aforementioned alternative expression among the vocabularies memorized by the aforementioned lexical storage means, 
The aforementioned lexical storage means is referred to to the voice inputted through the audio input unit. The 
aforementioned rhythm information-storage means is referred to based on the result of the analysis about the result of the 
aforementioned speech recognition to the voice by voice-analysis means to perform speech recognition and analysis of audio 
rhythm, and the aforementioned voice-analysis means by which, the input was carried out [ aforementioned ], and the 
aforementioned rhythm. It is characterized by having a permutation representation collating means to replace the portion of 
the aforementioned alternative expression in the portion of the vocabulary of the aforementioned norm. 
[0021] The suitable expression corresponding to an alternative expression which could perform voice input using an 
alternative expression and was inputted in the portion which he does not remember clearly even if the user has not memorized 
clearly the vocabulary memorized by the lexical storage means according to this invention is searched, and it becomes 
possible to replace by the suitable vocabulary which does not include an alternative expression. 

[0022] A means to output the voice-analysis result which this invention (claim 4) analyzes and carries out speech recognition 
of the voice inputted from the audio input unit, and includes a speech recognition result, In voice input interpretation 
equipment equipped with a lexical storage means to memorize the vocabulary which serves as a candidate for recognition in 
case this speech recognition is performed An alternative of arbitrary language, an alternative expression storage means to 
memorize a becoming alternative expression, and an alternative expression detection means to detect the same expression as 
the vocabulary memorized by the aforementioned alternative expression storage means from the inputted speech information, 
A permutation representation storage means to memorize what divided further the vocabulary memorized by the 
aforementioned lexical storage means, and was made into another word, The speech recognition of the portion which is not 
this alternative expression in the input speech information by which the aforementioned alternative expression was detected 
by the aforementioned alternative expression detection means The vocabulary memorized by the aforementioned permutation 
representation storage means is performed as a speech recognition object, and it is characterized by having a processing 
means to search a vocabulary appropriate as language by which alternative expression was carried out from the vocabulary 
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memorized by the aforementioned permutation representation storage means using this speech recognition result. 
[0023] It becomes possible to detect expression which can perform voice input using an alternative expression, and serves as 
an alternative of arbitrary language in the portion which he does not remember clearly even if the user has not memorized 
clearly the vocabulary memorized by the lexical storage means according to this invention from voice input, and to search the 
suitable expression corresponding to a detected alternative expression. 

[0024] Preferably the aforementioned processing means by performing the aforementioned speech recognition per syllable or 
phoneme, and referring to the recognition result of this syllable or a phoneme unit as a part of aforementioned alternative 
expression - the above -- the portion in which a part of regular vocabulary was added and uttered is detected, and in case 
expression by which alternative expression was carried out from the vocabulary memorized by the aforementioned 
permutation representation storage means is searched, you may make expression which suited the aforementioned detection 
result choose preferentially 

[0025] Into a user's alternative expression, it is adapted, and can depend for the right utterance on the information on a part of 

right utterance to the voice input got down and mixed, by this in part, and a suitable expression can be searched. 

[0026] The aforementioned alternative expression detection means is analyzed about the rhythm of input voice, and in case 

expression by which alternative expression was carried out from the vocabulary memorized by the aforementioned 

permutation representation storage means is searched, you may make the language suited or approximated to the conditions of 

the rhythm obtained as a result of the aforementioned analysis the aforementioned processing means choose preferentially 

preferably. 

[0027] In the voice input interpretation method which outputs the information on the vocabulary which this invention (claim 
7) interprets input voice, and corresponds Based on the information about the vocabulary of this norm in consideration of 
carrying out [ voice / input ] speech recognition, and voice input of a part of regular vocabulary defined beforehand being 
replaced and carried out to an alternative expression defined beforehand When the aforementioned alternative expression was 
detected from the aforementioned speech recognition result and the aforementioned alternative expression is detected from 
the aforementioned recognition result, It is characterized by searching the information about the regular vocabulary 
beforehand defined based on the portion of vocabularies other than this alternative expression included in the recognition 
result of the aforementioned input voice at least, and asking for the corresponding vocabulary. 

[0028] When two or more aforementioned vocabularies which carry out relevance are searched preferably, you may make it 
evaluate the priority of the corresponding vocabulary based on the phonological feature of the voice corresponding to the 
aforementioned alternative expression at least. 

[0029] In the voice input interpretation method which outputs the information on the vocabulary which this invention (claim 

9) interprets input voice, and corresponds A lexical storage means to memorize an alternative expression which substituted for 
a part of regular vocabulary which serves as a speech recognition object by alternative expression used as an alternative of 
arbitrary language, and which was defined beforehand as a kind of a vocabulary is referred to to the voice inputted through 
the audio input unit. Perform speech recognition and analysis of audio rhythm, and it is based on the result of the analysis 
about the result of the aforementioned speech recognition to the voice by which the input was carried out [ aforementioned ], 
and the aforementioned rhythm. With reference to an aforementioned rhythm information-storage means to memorize the 
notation and rhythm information of a vocabulary on the aforementioned norm which do not include the aforementioned 
alternative expression among the vocabularies memorized by the aforementioned lexical storage means, it is characterized by 
replacing the portion of the aforementioned alternative expression in the portion of the vocabulary of the aforementioned 
norm. 

[0030] In the voice input interpretation method which outputs the information on a vocabulary that it corresponds of the 
lexical storage meanses to memorize the vocabulary which serves as a candidate for recognition in case this invention (claim 

10) interprets input voice through speech recognition and this speech recognition is performed The same expression as the 
vocabulary memorized by alternative expression storage means to memorize an alternative expression which serves as an 
alternative of arbitrary language from the inputted speech information is detected. The speech recognition of the portion 
which is not this alternative expression in the input speech information by which the aforementioned alternative expression 
was detected The vocabulary memorized by permutation representation storage means to memorize what divided farther the 
vocabulary memorized by the aforementioned lexical storage means, and made into another word is performed as a speech 
recognition object. It is characterized by searching a vocabulary appropriate as language by which alternative expression was 
carried out from the vocabulary memorized by the aforementioned permutation representation storage means using this 
speech recognition result. 

[0031] In searching the aforementioned vocabulary preferably By performing the aforementioned speech recognition per 
syllable or phoneme, and referring to the recognition result of this syllable or a phoneme unit as a part of aforementioned 
alternative expression - the above -- the portion in which a part of regular vocabulary was added and uttered is detected, and 
in case expression by which alternative expression was carried out from the vocabulary memorized by the aforementioned 
permutation representation storage means is searched, you may make expression which suited the aforementioned detection 
result choose preferentially 

[0032] You may make it choose preferentially the language suited or approximated to the conditions of the rhythm obtained 
as a result of analyzing about the rhythm of input voice, when searching expression by which alternative expression was 
preferably carried out from the vocabulary memorized by the aforementioned permutation representation storage means. 
[0033] According to this invention, search the function to detect the wild card expression used as an alternative of a clear 
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expression, and its suitable expression for which it was substituted, and by adding the function to replace With the 
voice-analysis function accompanied by the lexical storage means with the vocabulary actually replaced by wild card 
expression, or search the suitable expression for which it was substituted, and moreover, by adding the function to replace 
Even when a part of vocabulary which a user can utter is memorized, it becomes possible to interpret the voice input by 
accepting the voice input which used wild card expression. 

[0034] Moreover, according to this invention, even when only the rhythm of the vocabulary which a user can utter is 
memorized, it becomes possible by accepting the voice input using the wild card expression corresponding to it to interpret 
the voice input. 

[0035] Thus, the practically great effect of being able to build the flexible voice input interpretation equipment which can 
accept the voice input and can be interpreted even if a user does not memorize clearly the vocabulary which equipment with 
voice input permits according to this invention is done so. 
[0036] 

[Embodiments of the Invention] Hereafter, the form of implementation of invention is explained, referring to a drawing. 
[0037] (1st operation form) The 1st operation form of this invention is explained first. 

[0038] The example of composition of the voice input interpretation equipment applied to this operation form at drawing 1 is 
shown. As shown in drawing 1 , the voice input interpretation equipment 1 of this operation form is equipped with the 
voice-analysis section 101, the lexical storage section 102, the permutation representation collating section 103, and the 
rhythm information-storage section 104. In addition, even if it prepares the A/D converter which changes input voice into a 
digital signal from an analog signal in voice input interpretation equipment 1, you may prepare it in an audio input unit 100 
side. 

[0039] The voice-analysis section 101 is the method which connects with the permutation representation collating section 
103, the lexical storage section 102, and the audio input units 100, such as a microphone, for example, is indicated by "the 
voice-recognition algorithm of the continuation word by the method of superposition, and continuation syllable" (an 
electronic-intelligence communication society paper magazine, J-66-D, 6, pp.637-644), and performs continuation word 
speech recognition for the vocabulary currently recorded on the lexical storage section 102. Furthermore, with a method 
which is indicated by "keyword spotting using pitch pattern information" (the Acoustical Society of Japan lecture collected 
works, September, Heisei 8, pp.29-30), for example, analysis is performed from the pitch pattern information on audio etc., 
and a rhythm parameter is generated. And the information shown in drawing 4 is passed to the permutation representation 
collating section 103. In addition, the method of the method of continuation word speech recognition and not only the method 
held above about the method which generates a rhythm parameter but others may be used. 

[0040] It is the portion which connects the lexical storage section 102 to the voice-analysis section 101, and records the 
vocabulary for speech recognition. While memorizing information as shown in drawing 2 about each of a regular vocabulary 
(in this case) in addition the thing about wild expression does not exist in the information on drawing 2 For example, about 
each of the vocabulary which replaced a part of regular vocabulary by wild card expression by wild card expression which is 
expression replaced by arbitrary number words, such as "being" or "HONYARARA somehow", information as shown in 
drawing 2 is memorized. 

[004 1 ] Although later mentioned about the detail of the information on drawing 2 , the symbolic convention of "idea" 
information is described previously. By the continuation word speech recognition performed in the voice-analysis section 101, 
since a recognition result can be expressed as a sequence of two or more words, the separation eye of the words is expressed 
with the sign " /" (slash). Moreover, the following explanation also uses a sign " / " for the notation of the separation eye of 
these words. 

[0042] moreover -- as wild card expression currently used -- "-- both the number word substitution word which is expression 
considered to be somehow replaced by some words like ", and rhythm both [ one side or ] which are considered to express 
like "HONYARARA" the rhythm of the expression which should be replaced are defined According to a system, you may 
define suitably the concrete content and its number of kinds of the number word substitution word to be used or a rhythm 
word. 

[0043] The example of the vocabulary generated by drawing 3 from the number word substitution word "it is rhythm word 
"HONYARARA as " somehow" of the "Tokyo stay in hotel" and wild card expression is shown. From this, the vocabulary by 
which wild card expression is replaced by the number word in "Tokyo", "Soutine", and a "hotel" is generated, and it turns out 
that the vocabulary which especially a rhythm word like "HONYARARA" is extended to length equal to expression replaced, 
and is replaced is generated (the number of "RA" is adjusting length in this case). 

[0044] Drawing 2 is the list of the information recorded in the lexical storage section 102. The example in the case of a 
vocabulary "the Tokyo HONYARARA hotel" is also shown collectively. "Idea" information is information showing the 
character string of the vocabulary. In the example of drawing 2 , it is recorded as "Tokyo / HONYARARA / hotel", and a 3 
word ******** idea. "Existence of wild card expression" information is information showing whether wild card expression 
previously stated to the vocabulary was included. In this case, the "****" [ s recorded [ in / wild card expression / in 
"HONYARARA" ]. "Expressional kind" information is information showing non-wild card expression whose each of the 
word contained in the vocabulary is not wild card expression and wild card expression. "An alternative" is given to the word 
of wild card expression and "decision" is given to non-wild card expression. In this example, since a word "Tokyo" and a 
"hotel" is [ "HONYARARA" ] wild card expression in non-wild card expression, (decision / alternative / decision), and 
information are given. "Kind of wild card ****" information is information as which wild card expression included in the 
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vocabulary expresses a number word substitution word and a rhythm word. In this case, since "HONYARARA" is defined as 
the rhythm word, it is recording as the "rhythm word." "Speech recognition parameter" information describes a parameter if 
needed for the speech recognition performed in the voice-analysis section 101 (in addition, since the speech recognition 
method used here is not the essence of this invention, the detailed explanation about this parameter is omitted). 
[0045] Drawing 4 is the list of the information passed to the permutation representation collating section 103 from the 
voice-analysis section 101 . It combines and the example when the "Tokyo HONYARARARA hotel" is inputted is also 
shown. "Recognition result" information is information which expresses with the voice-analysis section 101 the representation 
of the result by which continuation word recognition was carried out. "Tokyo / HONYARARARA / hotel" is shown by the 
example of drawing4 as a recognition result of the inputted sound signal. "Word utterance time" information is information 
showing the utterance time of each word acquired when continuation word recognition is carried out in the voice-analysis 
section 101. Although (650msec/820msec /510msec) is shown by this example, these numbers express the utterance time 
which corresponds to "Tokyo", "HONYARARARA", and the "hotel" in order. "Rhythm parameter" information is 
information showing the rhythm parameter analyzed in the voice-analysis section 101. Although this information becomes that 
from which a form differs by the analysis means of a rhythm parameter, it shows the case where time transition of intonation 
or fundamental frequency is used here. And the rhythm parameter which will be obtained is expressed in ** type here. The 
arrow sign "->" currently used by drawing 4 is expressing the intonation of the language in ** type, and the arrow which has 
below the arrow which is up in the high portion of intonation expresses the low portion of intonation. "Existence of wild card 
expression" information is information showing whether wild card expression previously stated to the inputted voice was 
included. In this case, the "****" is outputted [ in / wild card expression / in "HONYARARARA" ]. "Expressional kind" 
information is information showing non-wild card expression whose each of the word contained in the vocabulary is not wild 
card expression and wild card expression. This information will be acquired if the "expressional kind" information on drawing 
2 about a corresponding vocabulary is referred to, and it is the same as the "expressional kind" information in drawing 2 . [ of 
the notation method ] In the example of drawing 4 , since a word "Tokyo" and a "hotel" is [ "HONYARARARA" ] wild card 
expression in non-wild card expression, (decision / alternative / decision), and information are given. "Kind of wild card 
expression" information is information for wild card expression included in input voice discriminating a number word 
substitution word or a rhythm word. In this case, since "HONYARARARA" is a rhythm word, the "rhythm word" is 
outputted. 

[0046] The permutation representation collating section 103 collates the suitable expression corresponding to the wild card 
expression portion, when it connects with the voice-analysis section 101 and the rhythm information-storage section 104 and 
wild card expression is detected. About the detail of this portion, it mentions later. 

[0047] It connects with the permutation representation collating section 103, and the rhythm information-storage section 104 
records information as shown in drawing 5 about the regular vocabulary which does not include wild card expression among 
the vocabularies registered into the lexical storage section 102. 

[0048] Drawing 5 is the list of the information currently recorded in the rhythm information-storage section 104. Moreover, it 
unites and the example of the "Tokyo stay in hotel" is also shown. "Idea" information is idea information on the vocabulary. 
"Allowed-time" information expresses the phonation time of the sample of the language currently recorded. When the 
vocabulary can dissociate as a continuation word, the phonation time of each of the word is recorded. The notation method of 
this information is the same as it of the "word phonation time" information on drawing 4 . "Rhythm" information expresses 
the rhythm information analyzed from the sample of the language currently recorded. However, the method of analyzing 
rhythm information must be the same method as the method currently performed in the voice-analysis section 101. Moreover, 
the rhythm information outputted from the rhythm information-storage section 104 must also be the thing of rhythm parameter 
information and an isomorphous formula passed to the permutation representation collating section 103 from the 
voice-analysis section 101. The example of drawing 5 expresses the rhythm information which will be acquired after analysis 
like drawing 4 in ** type. 

[0049] Drawing 6 is the flow chart of operation of the permutation representation collating section 103 which carries out 
important work with this operation gestalt. Hereafter, the flow of processing is explained with reference to drawing 6 . 
[0050] (Step SI 01) Here, it checks whether the speech recognition result of the voice-analysis section 101 has wild card 
expression. This can be checked for "existence of wild card expression" information on drawing 4 passed from the 
voice-analysis section 101. And when wild card expression exists, when wild card expression does not exist to Step SI 02, a 
recognition result is outputted to it, and processing is ended to it. 

[0051] (Step SI 02) At this step, the vocabulary which suits the passed speech recognition result and serves as a candidate for 
an output is chosen from the rhythm information-storage section 104. For example, the idea information on the information ( 
drawing 5 ) currently recorded on the rhythm information-storage section 104 is used. It asks by referring to the kind 
information on an informational ( drawing 4 ) expression to which the non-wild card expression portion contained in the 
speech recognition result is passed from the voice-analysis section 101, It chooses by considering the vocabulary which suits 
the non-wild card expression existence position conditions that wild card expression has a length of one or more words, and 
being contingent [ on a non-wild card expression portion ] it. 

[0052] For example, supposing "Tokyo/what, a /hotel", and the kind information on expressional are (decision / alternative / 
decision), the obtained speech recognition result That whose at least one or more words a word "Tokyo" exists first, a word "a 
hotel" exists at the end like "Tokyo / Soutine / hotel", and "Tokyo / ENTA / continental one / hotel", and exist between them 
is chosen as a suiting vocabulary. 
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[0053] (Step SI 03) At this step, wild card expression included in a speech recognition result distinguishes a number word 
substitution word or a rhythm word. This can be checked for the "kind of wild card expression" information on drawing 4 
passed from the voice-analysis section 101 . And to Step SI 04, in the case of a number word substitution word, in the case of a 
rhythm word, the vocabulary extracted at Step SI 02 is outputted, and it ends processing. In addition, when two or more 
vocabularies to output exist, even if it outputs some in it, all may be outputted and it opts for processing of making a user 
present and choose two or more solutions etc. by processing (200 in drawing) peculiar to the application of an output place. 
[0054] (Step SI 04) At this step, an output vocabulary is further limited by comparing the "word phonation time" information 
passed from the voice-analysis section 101 with the "allowed-time" information currently recorded on the rhythm 
information-storage section 104 about the vocabulary extracted at Step SI 02. For example, the phonation time of "Tokyo" it 
is a non-wild card expression portion in the case of [ whose ] "Tokyo / HONYARARA / hotel", and a "hotel", A ratio with the 
allowed time of "Tokyo" currently recorded on the allowed-time information on the rhythm information-storage section 104 
about an object vocabulary and a "hotel" is calculated, respectively. The phonation time of the wild card portion of an input 
signal is elongated by the average of the ratio, the phonation time and the allowed time which were elongated are compared, 
and only the thing within a certain threshold is extracted. In addition, when not limiting a vocabulary by this processing, it is 
also possible by comparing time to determine the priority of the vocabulary to output. 

[0055] (Step SI 05) At this step, the vocabulary to output is determined by comparing the "rhythm parameter" information 
passed from the voice-analysis section 101 with the "rhythm parameter" information currently recorded on the rhythm 
information-storage section 104 about the vocabulary extracted at Step SI 04. For example, it compares by performing 
matching using the DP method by the method indicated by "keyword spotting using pitch pattern information" (the Acoustical 
Society of Japan lecture collected works, September, Heisei 8, pp.29-30). In addition, although this comparison method 
changes also with rhythm parameters constituted, if the parameter constituted can be used, even if it uses the arbitrary rhythm 
comparison methods, it will be available [ this operation form ]. And the vocabulary with which rhythm information is most 
similar to the uttered voice is outputted, and processing is ended. Or when recognizing two or more candidate existence, you 
may give and output priority to the order to which rhythm information is similar. 

[0056] The above is the composition and the function of the permutation representation collating section 103 concerning this 
invention, and an art. 

[0057] Then, the voice input interpretation method mentioned above is explained in more detail. Here, it explains by making 
work when a user performs voice input as a geographic information system as application into an example. 
[0058] The information on four hotels (a pulse hotel, the Tokyo stay in hotel, the mouth hotel of the Tokyo round head, the 
Tokyo ENTA continental hotel) is registered into this geographic information system, and suppose that the name of the four 
hotels is recorded on the lexical storage section 102. Moreover, the rhythm word "HONYARARA" mentioned above as wild 
card expression is registered into the lexical storage section 102, and the vocabulary generated from above-mentioned four 
hotels and "HONYARARA" is doubled, and suppose that the vocabulary shown in drawing 7 is registered into the lexical 
storage section 102. 

[0059] Moreover, suppose that information as shown in drawing 8 is recorded by asking the rhythm information-storage 
section 104 for idea information, rhythm information, and allowed-time information from the name of four registered hotels. 
[0060] And although the user wanted to hear it about the "Tokyo stay in hotel", it should presuppose that the portion of 
"Soutine" was not memorized clearly and the voice input a "tow KYOUHONYARARARA hotel" should be performed to this 
geographic information system. However, wild card expression "HONYARARARA" included in this utterance is taken as the 
utterance which was conscious of the rhythm of "Soutine." 
[0061] Hereafter, a motion of each part in the case of this example is described. 

[0062] First, in the voice-analysis section 101, continuation word recognition is performed with the vocabulary shown in 
drawing 7 to the inputted voice. And it supposes that "Tokyo / HONYARARARA / hotel" was chosen as a recognition result, 
and the information shown in drawing 9 is outputted to the permutation representation collating section 103 together with the 
phonation hour entry obtained at the time of recognition processing, and the rhythm information extracted from input voice. 
[0063] The permutation representation collating section 103 which received this information performs the following 
processings. 

[0064] (Step S101) From the existence information on passed wild card expression, it judges that a recognition result has wild 
card expression, and progresses to Step SI 02. 

[0065] (Step S102) Non-wild card expression is made into "Tokyo" and a "hotel" from recognition result information "Tokyo 
/ HONYARARARA / hotel" and the kind information on expressional (decision / alternative / decision), and what suits these 
2 word's existence position conditions is searched from the vocabulary ( drawing 8 ) registered into the rhythm 
information-storage section 104. In this case, there is "Tokyo" first, and finally there is a "hotel" and suppose that the 
vocabulary whose at least one word exists between them is applied to reference conditions. And the "Tokyo stay in hotel", 
"the mouth hotel of the Tokyo round head", and the "Tokyo ENTA continental hotel" are searched, and a "pulse hotel" is 
removed by the output candidate or let it be a low-ranking candidate. 

[0066] (Step SI 03) From the kind information on passed wild card expression, wild card expression "HONYARARARA" 
progresses to Step SI 04 noting that it is a rhythm word. 

[0067] (Step SI 04) From the vocabulary chosen at Step SI 02, the "Tokyo stay in hotel" to allowed-time information ( 
drawing 8 ) is first compared with the word utterance hour entry passed from the voice-analysis section 101. For example, it 
will be set to "Tokyo":700 / 650- 1.0769, "hotel":550 / 510= 1.0784 if the ratio (allowed-time information / word utterance 
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hour entry) of both about "Tokyo" and the "hotel" which are non-wild card expression is calculated first. Next, the average of 
these ratios is calculated and let the numeric value (1 .0777) acquired as a result be the extension coefficient used as the 
allowed time which is in the rhythm information-storage section 104 about input time, and this scale. And the input time of the 
wild card expression section after elongating the "HONYARARARA" portion which is in charge of wild card expression 
serves as 820msec(s)xl .0777=884msec. Next, it is between "Tokyo" and a "hotel" and the allowed time of the "Soutine" 
portion considered that wild card expression was substituted serves as 900msec(s). And what (for example, threshold 
processing) these two input time is compared for investigates whether time adjustment of a wild card expression portion can 
be taken. The result which performed the above-mentioned calculation is shown about the vocabulary chosen as drawing 10 at 
StepS102. 

[0068] Since the "Tokyo ENTA continental hotel" is considered to have carried out alternative expression of 
"ENTA/continental ones" by "HONYARARARA" here, 650+1050=1700msec which is equivalent to "ENTA" and 
"continental one" among allowed times (700msec/650msec/1050msec/550msec) is an allowed time corresponding to wild 
card expression "HONYARARARA." And for example, the difference of the time after extension and an allowed time is 
calculated, and if a larger thing than a threshold with the absolute value presupposes that processing removed from an output 
candidate is performed and the threshold is set to lOOmsec(s), the above-mentioned front twist "the Tokyo ENTA continental 
hotel" will be removed by the output candidate, or let it be a low-ranking candidate. 

[0069] (Step SI 05) The rhythm information is matched about the "Tokyo stay in hotel" which was not removed by old 
processing, and "the mouth hotel of the Tokyo round head." And as a result, a vocabulary with the rhythm information passed 
from the voice-analysis section 101 and near rhythm information is outputted, or it becomes the high vocabulary of priority. 
Here, the directions of the rhythm information on the "Tokyo stay in hotel" are judged to be the rhythm of input voice, and a 
near thing, the "Tokyo stay in hotel" is outputted as a high vocabulary of priority, and suitable processing is performed by 
processing (200 in drawing) peculiar to the application of an output place. Moreover, as a low-ranking candidate, if 
processing peculiar to application is able to process to two or more candidates, if required, as a low-ranking candidate, the 
"Tokyo ENTA continental hotel" and a "pulse hotel" will be combined, and will output "the mouth hotel of the Tokyo round 
head" in order further. 

[0070] Processing when voice input is carried out to the "Tokyo HONYARARARA hotel" above is ended. 
[0071] By the above explanation, the voice input interpretation equipment concerning this operation gestalt When a user does 
voice input of the portion for which the name the "Tokyo stay in hotel" is not understood in the state where it has not 
memorized clearly, either to the "Tokyo HONYARARARA hotel" using wild card expression It is possible to interpret to a 
suitable name and to output information to an abb RIKESHON portion. Moreover, when expression in the rhythm which 
cannot be expressed is inputted into a character string as the "Tokyo HONYARARARA hotel" using wild card expression 
even if the user knows, and the system concerning this operation gestalt interprets the phonation hour entry and rhythm 
information The "Tokyo ENTA continental hotel" and "the mouth hotel of the Tokyo round head" which similarly have the 
name of the form of the "Tokyo ~ hotel" show that priority is given to the way of the "Tokyo stay in hotel", and the speech 
information which the user inputted is used effectively. 

[0072] (2nd operation gestalt) Next, the 2nd operation gestalt of this invention is explained. 

[0073] Although continuation word recognition was used as a speech recognition method with the 1st operation gestalt, this 
operation gestalt makes application possible, even if a speech recognition method is not continuation word recognition. 
[0074] The example of composition of the voice input interpretation equipment applied to this operation gestalt at drawing 1 1 
is shown. As shown in drawing 1 1 , the voice input interpretation equipment 2 of this operation gestalt is equipped with the 
voice-analysis section 201, the lexical storage section 202, the wild card expression detecting element 203, the wild card 
expression storage section 204, the permutation representation collating section 205, and the permutation representation 
storage section 206. In addition, even if it prepares the AID converter which changes input voice into a digital signal from an 
analog signal in voice input interpretation equipment 2, you may prepare it in an audio input unit 100 side. 
[0075] If it connects with the permutation representation collating section 205, the lexical storage section 202, and the 
permutation representation storage section 206 and a speech recognition demand comes from the permutation representation 
collating section 205, the voice-analysis section 201 will perform voice word recognition using the vocabulary either the 
lexical storage section 202 or the permutation representation storage section 206 was specified to be, and will output the result 
to the permutation representation collating section 205. Moreover, single sound paragraph recognition is performed according 
to a demand of the recognition method, and it outputs to the permutation representation collating section 205 by making a 
recognition result into a MORA symbol string. In addition, about these speech recognition methods, since it is not the essence 
of this invention, the detailed explanation about these is omitted. 

[0076] It connects with the voice-analysis section 201 and the permutation representation collating section 205, and the 
lexical storage section 202 is a portion which records the vocabulary (it is regular) for speech recognition, and records the 
information shown in drawing 12 about each vocabulary for speech recognition in the form which the voice-analysis section 
201 and the permutation representation collating section 205 can use [ reference and ]. 

[0077] Drawing 12 is the list of the information which the lexical storage section 202 records. It combines and the 
information which the lexical storage section 202 records corresponding to a vocabulary "the Tokyo stay in hotel" is shown as 
an example. "Idea character string" information is a character string showing the vocabulary to register. "MORA symbol 
string" information describes reading of an idea character string by the MORA symbol string. "MORA symbol string length" 
information expresses the number of the MORA signs of the MORA symbol string recorded for MORA symbol string 
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information. With the method indicated by "keyword spotting using pitch pattern information" (the Acoustical Society of 
Japan lecture collected works, September, Heisei 8, pp.29-30), "rhythm parameter" information performs analysis from the 
pitch pattern information on audio etc., and records the rhythm parameter constituted. In addition, you may be the method of 
not only the above-mentioned method [ method / which generates a rhythm parameter ] but others. Moreover, the rhythm 
information which will be acquired is expressed with the example of drawing 12 in ** type. This notation method is the same 
as that of the thing of the 1st operation gestalt. In case "required for speech recognition parameter" information carries out 
this invention, it describes a parameter if needed for the speech recognition used in the voice-analysis section 201 (in addition, 
since the speech recognition method used here is not the essence of this invention, the detailed explanation about this 
parameter is omitted). 

[0078] The wild card expression storage section 204 memorizes wild card expression which is expression which connects 
with the wild card expression detecting element 203, for example, is replaced by arbitrary number words, such as "being" or 
"HONYARARA somehow", in the form which the wild card expression detecting element 203 can use [ reference and ]. 
moreover, wild card expression to memorize - "-- somehow --""-- what ~ it divides into the number word substitution word 
of expression replaced by number words, such as what", and the rhythm word showing the rhythm of expression which should 
be replaced ! TARARARA / "TARARARA" / "HONYARARA", ], and memorizes 

[0079] The wild card expression detecting element 203 is connected to an audio input unit 100, the wild card expression 
storage section 204, and the permutation representation collating sections 205, such as a microphone, and the vocabulary of 
wild card expression memorized by the wild card expression storage section 204 is detected using the method currently 
indicated by "noise immunity study in the speech recognition by word spotting" (electronic-intelligence communication 
society paper magazine Vol.J-74-D-II February, 1991 pp. 121 -129). In addition, as long as it is the technique of the ability to 
detect a specific vocabulary, not only the above-mentioned method but other detection methods may be used. And the wild 
card expression detecting element 203 gives information as shown in drawing 13 to the permutation representation collating 
section 205, and passes processing. 

[0080] Drawing 13 is the list of the information passed to the permutation representation collating section 205 from the wild 
card expression detecting element 203. Moreover, an example when a "tow KYOUHONYARARA hotel" is inputted 
collectively is also shown. "Existence of wild card expression" is the information showing whether wild card expression was 
detected by the wild card expression detecting element 203. In this example, "HONYARARA" is outputting the "****" in 
wild card expression. Although the "HARASHIN number" is a signal of the origin by which voice input was carried out, when 
wild card expression is detected, it is separated in the portion of the wild card expression, and is passed to the permutation 
representation collating section 205. In an example, it dissociates by wild card expression "HONYARARA", it separates into 
"tow KYOU", "HONYARARA", a "hotel", and three, and an input "a tow KYOUHONYARARA hotel" is passed in order to 
the permutation representation collating section 205. The signal of what position of the HARASHIN number with which "the 
position of wild card expression" was separated when wild card expression existed expresses numerically whether it is wild 
card expression. In this example, since "HONYARARA" is in No. 2 of the HARASHIN number divided into three, 2 is 
outputted. "The kind of wild card expression" is information as which detected wild card expression expresses a number word 
substitution word and a rhythm word. "HONYARARA" is made into the rhythm word in this example. This changes with 
information registered into the wild card expression storage section 204. "The MORA symbol string length of wild card 
expression" is the information showing the number of MORA signs, when detected wild card expression is a rhythm word. In 
this example, wild card expression "HONYARARA" is four MORA signs. "The rhythm information on wild card expression" 
is information showing the rhythm, when detected wild card expression is a rhythm word. This performs analysis from the 
BITCHI pattern information on the inputted voice etc., and is passed to the permutation representation collating section 205. 
In addition, about the method which generates a rhythm parameter, the rhythm parameter generated must become the same 
thing as the form recorded on the lexical storage section 202 and the permutation representation storage section 206. 
[0081] The permutation representation collating section 205 collates the suitable expression corresponding to the wild card 
expression portion, when it connects with the wild card expression detecting element 203, the voice-analysis section 201, the 
lexical storage section 202, and the permutation representation storage section 206 and wild card expression is detected. It 
mentions later about the detail of this portion. 

[0082] From the vocabulary which connects the permutation representation storage section 206 to the voice-analysis section 
201 and the permutation representation collating section 205, and is registered into the lexical storage section 202, for 
example like the "Tokyo stay in hotel" to "Tokyo", "Soutine", and a "hotel" Furthermore, the language used as the 
combination of the word generated by separating into that which is meaningful as a word, or the word which is continuing like 
"Tokyo Soutine" and a "stay in hotel" is memorized in the same form (refer to drawing 12 ) as the lexical storage section 202. 
[0083] Drawing 14 is the outline composition of the permutation representation collating section 205 which carries out 
important work with this operation gestalt of operation. Hereafter, the flow of processing is explained with reference to 
drawing 14 . 

[0084] (Step S201) It checks whether the inputted voice input has wild card expression. This can be checked from "existence 
of wild card expression" information ( drawing 13 ) given from the wild card expression detecting element 203. And if wild 
card expression exists and wild card expression does not exist to Step S204, it progresses to Step S202. 
[0085] (Step S202) When it is judged that there is no wild card expression, speech recognition processing is performed as it 
is. Word recognition with the vocabulary memorized by the voice-analysis section 201 to the inputted HARASHIN number at 
the lexical storage section 202 is requested. 
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[0086] (Step S203) The speech recognition result outputted from the voice-analysis section 201 is handed over to processing 
(200 in drawing) peculiar to application, or higher voice-analysis processing, and processing is ended. 
[0087] (Step S204) When it is judged that there is wild card expression, it investigates how the portion which is not wild card 
expression was uttered and inputted. For example, from the HARASHIN number which is separated from the wild card 
expression detecting element 203, and is passed, and "position of wild card expression" information, the signal of wild card 
expression is searched for and word recognition with the vocabulary memorized by the voice-analysis section 201 at the 
permutation representation storage section 206 is requested to the signal of the portion (non-wild card expression section) 
which is not wild card expression. In the following explanation, the speech recognition result obtained at this step is called 
"partial recognition result." Moreover, when the suitable vocabulary for the permutation representation storage section 206 
does not exist, suppose that the partial recognition result corresponding to the non-wild card expression section is not existed. 
[0088] (Step S205) Here, it investigates whether a certain information is between the wild card expression with the non-wild 
card expression section, the case where this knows a part of starting point terminal point when a user does not know the 
pronunciation of a clear word - ""SU" - since it corresponds when uttered in the form somehow given before and after wild 
card expression like ", it carries out " "SU" If are somehow uttered like " and "****♦*" j s not registered into wild card 
expression registered into the wild card expression storage section 204, since wild card expression detected by the wild card 
expression detecting element 203 is "what", "SU" to which the user uttered wild card expression as an intention will be 
processed as a part of non-wild card expression. Also in this case, when it judges and exists [ whether the portion carried out 
to a part of wild card expression exists, and ] in the non-wild card expression section, it enables it to process as a part of wild 
card expression. 

[0089] First, the buffer which memorizes a MORA symbol string is prepared for each of the detected wild card expression 
section. This buffer memorizes the information with a MORA sign, when information is able to be detected between the 
non-wild card expression section and the wild card expression section. Moreover, since the detected information may appear 
before the wild card expression section and in the back, corresponding to it, two buffers are prepared at a time about each 
wild card expression. At Step S205, processing which extracts the MORA sign inputted into a buffer is performed to each of 
the portion which adjoins the wild card expression section among the detected non-wild card expression sections. Drawing 1 5 
shows the outline composition of the processing (processing of Step S205) to one of the detected non-wild card expression 
sections. Below, it explains, referring to drawing 15 . 

[0090] (Step S 205-1) It is ** BE ** about how non-wild card expression is uttered at this step. For example, the speech 
recognition of a syllable unit is requested from the voice-analysis section 201 to the non-wild card expression section which 
was applicable. In the following explanation, the MORA symbol string outputted at this step is called "partial tone paragraph 
recognition result." 

[0091] (Step S 205-2) At this step, it checks whether the partial recognition result corresponding to the non-wild card 
expression used as the present object exists. Consequently, when a partial recognition result does not exist, and the partial 
recognition result exists to step S205-3, it progresses to them step S205-4. 

[0092] (Step S 205-3) When the partial recognition result corresponding to the non-wild card expression section used as the 
present object does not exist, it can be judged that this non-wild card expression section is carrying out expression shorter 
than the vocabulary of the permutation representation storage section 206. Then, it is supposed that it is a part of wild card 
expression section which all these non-wild card expression sections adjoin. [ whether it is in the anterior part of the wild card 
expression section which this non-wild card expression adjoins, and ] Judge whether it is in a posterior part, memorize the 
MORA sign (train) to a corresponding buffer, and when the wild card expression section is a rhythm word It adds to the 
"string length of wild card expression" information received from the wild card expression detecting element 203 only several 
MORA sign minutes memorized to the buffer. When this non-wild card expression section exists in the anterior part of the 
wild card expression section, 1 **** of "position of wild card expression" information is carried out, and it ends. 
[0093] (Step S 205-4) Here, it checks whether the language pronounced besides the partial recognition result [ be / under / of 
the non-wild card expression section which has been applicable / correspondence / it ] is contained. For example, the MORA 
symbol string length of the partial tone paragraph recognition result corresponding to the non-wild card expression used as the 
present object is compared with the MORA symbol string length of a partial recognition result. Consequently, if it is the case 
where the MORA symbol string length of a partial recognition result is longer, and the case that both are equal, it progresses 
to step S205-5, and when the MORA symbol string length of a partial tone paragraph recognition result is longer, it will 
progress to step S205-6. 

[0094] (Step S 205-5) It judges that there is no information beyond a corresponding partial recognition result in non-wild card 
expression used as the present object, and ends, without inputting anything into a buffer. 

[0095] (Step S 205-6) Since the MORA symbol string length of the partial recognition result corresponding to the non-wild 
card expression section made into the present object is shorter than the MORA symbol string length of the partial tone 
paragraph recognition result of the same portion It judges that a part of wild card expression may be uttered by everything but 
a partial recognition result corresponding to non-wild card expression used as the present object, and investigates equivalent 
to which portion of the HARASHIN number of the non-wild card expression section a partial recognition result is. 
[0096] For example, the MORA symbol string of a partial recognition result is serially applied to the MORA symbol string of 
a partial tone paragraph recognition result like drawing 16 , and it asks by comparing both MORA symbol string. Although 
drawing 16 is comparing the partial recognition result "Tokyo" (MORA symbol string "TOOKYOO") and the partial tone 
paragraph recognition result "** OKYOOSU", the MORA symbol string length of a partial recognition result is 4, the MORA 
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symbol string length of a partial tone paragraph recognition result is 5, and a partial recognition result can consider two 
patterns which make the sign which starts a reliance panel "**" and "0" (the first O) of "TOOKYOOSU." When a partial 
recognition result is still shorter, the pattern which considers "KYO" or subsequent ones as a start appears. 
[0097] And it judges which pattern to apply is the optimal, and the portion of "remainder" determines somewhere. It is 
thought that the portion of this "remainder*' is the part which should be made a part of wild card expression portion contained 
in non-wild card expression. As the determination method of a portion, a place with many [ when it applies and is based on 
the congruous numbers of MORA signs ] MORA signs most in agreement is chosen not much as a place where a partial 
recognition result exists, for example, and the portion to which the MORA symbol string of a partial recognition result is not 
applied is extracted as "remainder." 

[0098] In the example of drawing 16 , the last character "SU" is extracted as too much portion. 
[0099] Moreover, when it cannot be decided in two or more kinds of patterns with which the number of MORA signs in 
agreement becomes the maximum existing etc. that the position of a partial recognition result will be a meaning, it is judged 
that too much portion does not exist. Or when it is below a threshold with the number of MORA signs in agreement, you may 
judge that too much portion does not exist. 

[0100] (Step S 205-7) Here, it checks whether too much portion has been extracted the result of step S205-6. If too much 
portion exists, and too much portion does not exist to step S205-8, it will progress to them step S205-5. 
[0101] (Step S 205-8) Here, it checks whether it exists in the place where too much extracted portion adjoined the wild card 
expression portion the result of step S205-6, In the example of drawing 16 , since too much portion "SU" exists just before 
wild card expression "NANTOKA", it is judged that "SU" exists in the place which adjoined "NANTOKA" not much. On the 
contrary, it is judged that "SU" does not adjoin "NANTOKA" just because it is in the backmost part of "tow KYOUSU", 
when "NANTOKA" exists before "tow KYOUSU." In this case, if it is "**", a "tow", etc. not much, it can be judged that 
remainder exists immediately after wild card expression "NANTOKA." If remainder exists in an adjacent part, it will progress 
to step S205-9. If remainder does not exist in an adjacent part, it progresses to step S205-5. 

[0102] (Step S 205-9) Since too much portion extracted by step S205-6 adjoins wild card expression here It is supposed that 
it is a part of wild card expression section which too much of this extracted portion adjoins. [ whether it is in the anterior part 
of the wild card expression section which too much of this portion adjoins, and ] Judge whether it is in a posterior part, 
memorize the MORA sign (train) to a corresponding buffer, and when the wild card expression section is a rhythm word In 
addition, it ends only several MORA sign minutes memorized to the buffer to the "string length of wild card expression" 
information received from the wild card expression detecting element 203. 

[0103] If word ability to detect which is in the wild card expression detecting element 203 at the voice-analysis section 201 
other than the above-mentioned method is given The word of the speech recognition result obtained at Step S204 out of the 
separated HARASHIN number is detected. It is also possible to presume the MORA sign (train) which becomes a part [ it is 
the same with the above and / wild card expression ] by turning off the signal which remained in the portion of a boundary 
with wild card expression after that, and requesting the speech recognition of a syllable unit from the voice-analysis section 
201. 

[0104] (Step S206) By making into reference conditions the information acquired by processing at Steps S204-S205, and the 
information shown in drawing 1 3 obtained from the wild card expression detecting element 203, the language which is 
applied to a wild card expression portion from the vocabulary memorized by the permutation representation storage section 
206 is searched so that it may be in agreement with the vocabulary memorized by the lexical storage section 202. Drawing 17 
is the flow chart of operation performed at Step S206. Hereafter, the flow of processing is explained with reference to drawing 
12- 

[0105] (Step S 206-1) At this step, the vocabulary which suits the passed speech recognition result and serves as a candidate 
for an output is chosen from the lexical storage section 202. For example, it chooses by using the idea information on the 
information ( drawing 12 ) currently recorded on the lexical storage section 202, considering the vocabulary which suits 
partial recognition result's of non-wild card expression portion existence position conditions that wild card expression has a 
length of one or more words, and being contingent [ on a non-wild card expression portion ] it. And the portion by which 
alternative expression was carried out is searched with wild card expression of the selected vocabulary from expression 
currently recorded on the permutation representation storage section 206. For example, supposing the list of the HARASHIN 
number which are "Tokyo" and a "hotel" and was able to be carved further, and the positional information of wild card 
expression show that the obtained partial recognition result is the order of the "Tokyo (wild card expression) hotel" That 
whose at least one or more words a word "Tokyo" exists first, a word "a hotel" exists at the end like the "Tokyo stay in hotel", 
and exist between them is chosen as a suiting vocabulary. And expression "Soutine" etc. is searched as a portion by which 
wild card expression was carried out from the permutation representation storage section 206. 
[0106] (Step S 206-2) At this step, it limits further from expression extracted by step S206-1 by making into reference 
conditions the MORA sign (train) recorded on the buffer processed by step S205. 

[0107] (Step S 206-3) At this step, wild card expression included in a speech recognition result distinguishes a number word 
substitution word or a rhythm word. This can be checked for the "kind of wild card expression" information on drawing 1 3 
passed from the wild card expression detecting element 203. And in the case of a rhythm word, it progresses to step S206-4, 
and in the case of a number word substitution word, expression extracted by step S206-2 and the regular vocabulary which 
consists of a partial recognition result are outputted, and it ends processing. In addition, when two or more vocabularies to 
output exist, even if it outputs some in it, all may be outputted and it opts for processing of making a user present and choose 
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two or more solutions etc. by processing (200 in drawing) peculiar to the application of an output place. 

[0108] (Step S 206-4) At this step, an output vocabulary is further limited by comparing the MORA symbol string length 

information on wild card expression passed from the wild card expression detecting element 203 with the MORA sign length 

information currently recorded on the permutation representation storage section 206 about the vocabulary extracted by step 

S206-1. For example, only the thing within a threshold with the difference of both MORA symbol string length is extracted. 

In addition, when not limiting a vocabulary by this processing, it is also possible to determine the priority of the vocabulary to 

output. 

[0109] (Step S 206-5) At this step, the vocabulary to output is determined by comparing step S206 "rhythm parameter" 
information with the "rhythm parameter" information currently recorded on the rhythm information-storage section 104 about 
the vocabulary extracted by -4. [ which was passed from the voice-analysis section 101 ] For example, it compares by 
performing matching using the DP method by the method indicated by "keyword spotting using pitch pattern information" 
(the Acoustical Society of Japan lecture collected works, September, Heisei 8, pp.29-30). In addition, although this 
comparison method changes also with rhythm parameters constituted, if the parameter constituted can be used, even if it uses 
the arbitrary rhythm comparison methods, it will be available [ this operation form ]. And the vocabulary with which rhythm 
information is most similar to the uttered voice is outputted, and processing is ended. Or when recognizing two or more 
candidate existence, you may give and output priority to the order to which rhythm information is similar. 
[0110] The above is the composition and the function of the permutation representation collating section 205 concerning this 
operation form, and an art. 

[0111] Then, the voice input interpretation method mentioned above is explained in more detail. Here, the example of the 
geographic information system used on the occasion of explanation of the 1st example is given, and it explains by making 
work when a user performs voice input into an example. 

[0112] Four hotels around the Tokyo station (the Tokyo stay in hotel, the mouth hotel of the Tokyo round head, a pulse hotel, 
the Tokyo ENTA continental hotel) are registered into this geographic information system, and the information shown in 
drawing 18 and the parameter required for each speech recognition are recorded on the lexical storage section 202 about the 
four hotels. And the expression used as the combination of the word which is registered into the permutation representation 
storage section 206 and which was separated from these vocabularies and a continuous word comes to be shown in drawing 
19. 

[0113] Moreover, suppose that the number word substitution word "NANTOKA" is registered into the wild card expression 
storage section 204 as wild card expression. 

[0114] Next, although the user wanted to hear it about the "Tokyo stay in hotel", it should presuppose that the portion of 
"Soutine" was not memorized clearly and the voice input a "tow KYOUSUNAN talker hotel" should be performed to this 
geographic information system. 

[0115] Hereafter, in order to clarify the notation, the character string obtained by [--] like [SHINGOU] in the wave signal 
before obtaining a speech recognition result after obtaining a speech recognition result is expressed with "--" like a "character 
string." 

[0116] In the wild card expression detecting element 203, detection of wild card expression is first performed in response to 
the input. Wild card expression [NANTOKA] is included in the signal [a tow KYOUSUNAN talker hotel], and this is 
detected as wild card expression. And information like drawing 20 is passed to the permutation representation collating 
section 205. 

[0117] The following is processing in the permutation representation collating section 205. 

[0118] (Step S201) It is checked that wild card expression exists from the existence information on wild card expression. 
[0119] (Step S204) It dissociates and speech recognition turns out that required portions are a signal [tow KYOUSU] and a 
[hotel] from the HARASHIN number passed and the positional information of wild card expression. And word recognition 
with the lexical set currently recorded on the voice-analysis section 201 by the permutation representation storage section 206 
of these two signals is requested. Consequently, a partial recognition result and the recognition result of a signal [tow 
KYOUSU] presuppose that the recognition result of "Tokyo" and a [hotel] was obtained with the "hotel." 
[0120] (Step S205) Processing is first begun from a signal [tow KYOUSU], 

[0121] (Step S 205-1) Recognition of the syllable unit of signal 1 tow KYOUSU] is requested from the voice-analysis section 
20 1 . Consequently, suppose that the MORA symbol string "TOOKYOOSU" was obtained. 

[0122] (Step S 205-2) Since "Tokyo" is obtained as a recognition result of a signal [tow KYOUSU], it progresses to step 
S205-4. 

[0123] (Step S 205-4) The MORA symbol string length of a MORA symbol string "TOOKYOOSU" is 5. Moreover, a partial 
recognition result "Tokyo" presupposes that it was recorded on the permutation representation storage section 206 like 
drawing 21 . 

[0124] This MORA symbol string length is compared, and since the partial tone paragraph recognition result 
"TOOKYOOSU" of the inputted signal is longer, it progresses to step S205-6. 

[0125] (Step S 205-6) If the MORA symbol string "TOOKYOO" of a syllable recognition result "TOOKYOOSU" and a 
partial recognition result "Tokyo" is compared, it will become like drawing 16 and a MORA sign "SU" will be detected as 
remainder. 

[0126] (Step S 205-7) Since the MORA sign "SU" was detected as remainder, it progresses to step S205-8. 

[0127] (Step S 205-8) Since the signal [tow KYOUSU] which is located in the backmost part of a syllable recognition result 
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"TOOKYOOSU", and becomes the origin of this syllable recognition result has too much MORA sign "SU" just before a wild 
card expression portion [NANTOKA], "SU" is judged not much to be a part of wild card expression. Step S It progresses to 
205-9. 

[0128] (Step S 205-9) A MORA sign "SU" is inputted into the buffer which accumulates the pronunciation of the front part of 
wild card expression. 

[0129] Next, processing same about a signal [a hotel] is performed. Here, it supposes that the portion was seldom able to be 
found and everything but a partial recognition result "a hotel" progresses to the next processing, without recording anything 
on a buffer. 

[0130] (Step S206) Here, a suitable vocabulary is searched from old information. 

[0131] (Step S 206-1) It is judged that the vocabulary which serves as an object by which voice input was carried out from the 
HARASHIN number information, a partial recognition result, the positional information of wild card expression, etc. is the 
"Tokyo (wild card expression) hotel." If the suitable vocabulary suitable for the above-mentioned conditions is extracted from 
the vocabulary currently recorded on the lexical storage section 202, the "Tokyo stay in hotel", "the mouth hotel of the Tokyo 
round head", and the "Tokyo ENTA continental hotel" will be chosen. Moreover, "Soutine", "the mouth of a round head", and 
"ENTA continental" are chosen from expression registered into the permutation representation storage section 206 from these 
conditions as an output candidate as expression for which wild card expression was substituted. At this time, a "pulse hotel" is 
removed by the output candidate from an output candidate, or it becomes a low-ranking candidate. 

[0132] (Step S 206-2) If the buffer recorded at Step S205 is referred to, "Soutine" of expression which begins from a MORA 
sign "SU" can judge that it is leading. Here, the vocabulary "the Tokyo stay in hotel" in which "Soutine" was contained 
becomes leading as an output candidate. "The mouth hotel of the Tokyo round head" and the "Tokyo ENTA continental 
hotel" are removed by the output candidate, or serve as a low-ranking candidate. 

[0133] (Step S 206-3) Since it turns out that it is a number word substitution word, wild card expression ("NANTOKA") used 
from the information sent from the wild card expression detecting element 203 outputs the 1st "the place of the Tokyo stay in 
hotel" as a candidate. Or when the application portion corresponds also to two or more candidates, "the mouth hotel of the 
Tokyo round head" and the "Tokyo ENTA continental hotel" are outputted as a low-ranking candidate, and it outputs a "pulse 
hotel" as a low-ranking candidate further. And processing (200 in drawing) peculiar to application undergoes this output, and 
performs suitable processing. 

[0134] Processing when voice input is carried out to a "tow KYOUSUNAN talker hotel" above is ended. 
[0135] By the above explanation, the voice input analysis apparatus concerning this operation gestalt the portion which does 
not understand a user for the memorized portion concretely in the state where the name the "Tokyo stay in hotel" is not 
memorized clearly -- wild card expression -- using - "- Tokyo SU - by carrying out voice input to what or hotel" It is 
possible to interpret to a suitable name and to output information to an application portion, moreover, the portion which the 
fine information "a Tokyo SU - hotel" which the user knows does not understand - wild card expression ~ using - "-- Tokyo 
SU - by inputting what and hotel" It turns out that priority is given to the way of the "Tokyo stay in hotel" over "the mouth 
hotel of the Tokyo round head" and the "Tokyo ENTA continental hotel" which similarly have the name of the form of the 
"Tokyo ~ hotel", and the speech information which the user inputted is used effectively. 

[0136] The voice input interpretation equipment which operates even if a user does not memorize the word or text which can 
be uttered correctly according to this equipment constituted in this way can be built. 

[0137] For example, even when a part of word which a user can utter, or text is memorized, audio incorrect recognition is 
pressed down, and the voice input interpretation equipment to which the output of a system with voice input can be led with 
what met the intention of a user can be built. 

[0138] Moreover, even when the "rhythm" of the word which a user can utter, or a text is memorized, audio incorrect 
recognition is pressed down, and the voice input interpretation equipment which can lead the output of a system with voice 
input to what met the intention of a user can be built. 

[0139] In addition, the operation effect of each operation gestalt is not limited to the example mentioned above. For example, 
with the 1st operation gestalt, in the permutation representation collating section 103 and the 2nd operation gestalt, the list of 
results by which substitution processing was carried out in the permutation representation collating section 205 can be shown 
to a user, and a malfunction can be avoided by making the right thing choose. 

[0140] Moreover, it is also possible to use as an input means of a multi-modal interface, to narrow reference width of face 
further, to press down the redundancy of an output, and to mitigate a user's burden. 

[0141] Moreover, it is possible to use as an input means of the equipment by which voice input only with an arbitrary 
multi-modal interface is accompanied. Moreover, it is also possible to analyze and use rhythm information to the portion by 
which wild card expression was carried out, and all the inputted speech information. 

[0142] Below, it explains, referring to drawing 22 about the equipment configuration in the case of realizing processing in this 
voice input interpretation equipment using software. 

[0143] In this case, the hardware portions of this voice input interpretation equipment are RAM22 for storing CPU21, and a 
program and required data, disk drive equipment 24, storage 25, and I/O device 26. 

[0144] In the case of the 1st operation gestalt, the voice-analysis section 101 of drawing 1 , the lexical storage section 102, the 
permutation representation collating section 103, and the rhythm information-storage section 104 are constituted by the 
program which described each procedure. 

[0145] In the case of the 2nd operation gestalt, the voice-analysis section 201 of drawing 11 , the lexical storage section 202, 
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the wild card expression detecting element 203, the wild card expression storage section 204, the permutation representation 
collating section 205, and the permutation representation storage section 206 are constituted by the program which described 
each procedure. 

[0146] In addition, the information stored in each storage section may be united with a program, and a program may be set up 
independently. 

[0147] BUROGURAMU which described this procedure is stored in RAM22 as a program for controlling the computer 
system of drawing 22 , and is performed by CPU2 1 . According to the procedure of the program stored in RAM22, CPU2 1 
performs operation, control of storage 25 or I/O device 26, etc., and realizes the desired function. 

[0148] Various methods can be used for installing a program in RAM22. For example, the above-mentioned program (it is the 
program which described the procedure of the voice-analysis section 101 of drawing 1 , the lexical storage section 102, the 
permutation representation collating section 103, and the rhythm information-storage section 104) It is the program which 
described the program for controlling a computer system, and the procedure of the voice-analysis section 201 of drawing 1 1 , 
the lexical storage section 202, the wild card expression detecting element 203, the wild card expression storage section 204, 
the permutation representation collating section 205, and the permutation representation storage section 206. The program for 
controlling a computer system is stored in the storage (for example, removable storages, such as a floppy disk or CD-ROM) in 
which a readout is possible by computer. And as shown in drawing 22 , this program is read using the disk drive equipment 24 
according to the storage, and it stores in RAM22. Or it once installs in the disk drive equipment 24 grade, and stores in 
RAM22 from this equipment at the time of execution. 

[0149] Moreover, when the storage which stored the program is an IC card, this BUROGURAMU can be read using IC card 

reader. Furthermore, a program is also receivable from a predetermined interface device through a network. 

[0150] In addition, it is good as for what became independent about the equipment which may carry the application which 

uses the interpretation result for voice input interpretation equipment, and carries voice input interpretation equipment and 

application. Moreover, the program which realizes voice input interpretation equipment, and the program which realizes 

application using the interpretation result may be performed on the same CPU, and may be performed on CPU prepared 

separately. 

[0151] By the way, although the 1st and 2nd operation gestalt has described that wild card expression is realized by the 
premise that only one is inputted Even if the input of plurality [ expression / wild card ] is performed, the vocabulary which 
corresponds with the 1st operation gestalt is generated in the lexical storage section 102. About wild card expression which it 
is possible to treat if processing same about each of the wild card expression portion which corresponds in the permutation 
representation collating section 103 is performed, and was detected with the 2nd operation gestalt, the position, The thing for 
recording the middle of wild card expression of the buffer which passes the information about a kind and rhythm to the 
permutation representation collating section 205, and records a part of wild card expression is added. It is possible to treat, if 
each wild card expression which collected and was detected as one wild card expression is similarly processed when wild card 
expression appears continuously. 

[0152] Moreover, especially the reference conditions set up with the 1st and 2nd operation gestalt are not peculiar to each 
operation gestalt, for example, may use voice input time at the time of the permutation representation reference in the 2nd 
operation gestalt. moreover » the 1st operation gestalt - ""SU" ~ although it has described that the input which mixed the 
right expression with a part of wild card expression like " somehow is realized by the premise of not being carried out, if "** / 
a vocabulary somehow like " is set as the lexical storage section 102, it can respond easily Moreover, it is also possible to 
make rhythm etc. into reference conditions about all expression, even if it defines wild card expression neither as a number 
word substitution word nor a rhythm word. 

[0153] Moreover, it is possible to apply this invention not only to Japanese but to all the languages with which wild card 
expression exists by considering analysis of a MORA sign unit as analysis of common units, such as syllable or a phoneme. 
Moreover, it is also possible to apply the portion which does not understand words for this invention to musical reference by 
the input sung in a rhythm. 

[0154] this invention is not limited to the gestalt of operation mentioned above, in the technical range, can deform variously 

and can be carried out. 

[0155] 

[Effect of the Invention] The voice input which includes the alternative expression even if a user does not memorize clearly 
the vocabulary permitted as voice input, since it replaces by regular expression which detects the portion which carried out 
alternative expression of a part of regular vocabulary from input voice, and carries out appropriate to this portion according to 
this invention can be accepted, and this can be interpreted. 

[Translation done.] 
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Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original precisely. 
2.**** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



EFFECT OF THE INVENTION 



[Effect of the Invention] The voice input which includes the alternative expression even if a user does not memorize clearly 
the vocabulary permitted as voice input, since it replaces by regular expression which detects the portion which carried out 
alternative expression of a part of regular vocabulary from input voice, and carries out appropriate to this portion according to 
this invention can be accepted, and this can be interpreted. 
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DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 11 Drawing showing the example of composition of the voice input interpretation equipment concerning the 1st 
operation gestalt of this invention 

[Drawing 2] Drawing showing an example of the information recorded on the lexical storage section 

[Drawing 31 Drawing showing an example of the vocabulary recorded on the lexical storage section 

[Drawing 41 Drawing showing an example of the information passed to the permutation representation collating section from 

the voice-analysis section 

[Drawing 51 Drawing showing an example of the information currently recorded on the rhythm information-storage section 
[Drawing 61 The flow chart which shows an example of operation of the permutation representation collating section 
[Drawing 71 Drawing showing an example of the vocabulary registered into the lexical storage section 
[Drawing 81 Drawing showing an example of the information registered into the rhythm information-storage section 
[Drawing 91 Drawing showing an example of the information outputted to the permutation representation collating section 
from the voice-analysis section 

[Drawing 101 Drawing showing an example of the reference result of the vocabulary which suits a speech recognition result 
[Drawing 1 1 1 Drawing showing the example of composition of the voice input interpretation equipment concerning the 2nd 
operation gestalt of this invention 

[Drawing 121 Drawing showing an example of the information recorded on the lexical storage section 

[Drawing 131 Drawing showing an example of the information passed to the permutation representation collating section from 

a wild card expression detecting element 

[Drawing 141 The flow chart which shows an example of operation of the permutation representation collating section 

[Drawing 151 The flow chart which shows an example of the procedure to a non-wild card expression portion 

[Drawing 161 Drawing for explaining reference of a part of wild card expression 

[Drawing 171 The flow chart which shows an example of the procedure to a wild card expression portion 

[Drawing 181 Drawing showing an example of the vocabulary registered into the lexical storage section 

[Drawing 191 Drawing showing an example of the information registered into the permutation representation storage section 

[Drawing 201 Drawing showing an example of the information passed to the permutation representation collating section from 

a wild card expression detecting element 

[Drawing 211 Drawing showing an example of the information recorded on the permutation representation storage section 

[Drawing 221 Drawing showing an example of hardware composition 

[Description of Notations] 

1 2 - Voice input interpretation equipment 

100 -- Audio input unit 

101 -- Voice-analysis section 

102 -- Lexical storage section 

103 -- Permutation representation collating section 

104 — Rhythm information-storage section 

201 -- Voice-analysis section 

202 -- Lexical storage section 

203 -- Wild card expression detecting element 

204 - Wild card expression storage section 

205 -- Permutation representation collating section 

206 - Permutation representation storage section 

21 -- CPU 

22 -- RAM 

23 -- Bus 

24 -- Disk drive equipment 

25 « Storage 

26 -- I/O device 
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